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(54) Title: METHOD OF PRODUCING A DIGITAL SIGNAL PROCESSOR 
(57) Abstract 

In producing digital signal processors generation of behavioral 
models (18) is used. Such a processor is constituted of hardware 
modules like computational units and a control unit. The behavioral 
model (18) is a description of the processor hardware and the 
program to be executed by the processor, the model being in 
executable program code written in a HLL programming language. 
First an architecture of the processor or parts thereof is determined 
(I) using predetermined templates (17). A layout description of 
the corresponding circuit is made (21) based on the determined 
architecture, this layout description being used to produce the layout 
of the corresponding circuits, so that a semiconductor chip can be 
processed according to the layout. When making a layout description 
of a processor part the structure of the processor or part thereof is 
coded in a textfile (3) as definitions of variables and operations and/or 
functions which are to be performed using the variables as arguments. 
These definitions are interpreted (21) to form the respective registers 
and units performing the functions and lines connecting the registers 
and the units. The textfile (3) is also used for making the behavioral 
model (18) that in turn can be used for simulating (18) the processor. 
A very rapid way of constructing adapted signal processors is obtained 
hereby, the method being used by a circuit designer interactively with 
computer software. 
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Method of producing a digital signal processor 

The present invention relates to a method of producing programmable processors and 
also a method using a special technique for generation of behavioral models for pro- 
& grammable processors. 

BACKGROUND OF THE INVENTION AND STATE OF THE ART 
Future telecommunication systems will put high requirements on flexibility, short 
design time and design reuse for digital signal processing systems due to the fast changing 
market. Other requirements are low power consumption and small silicon area. For the 
o portable device market, there will be battery powered terminals for speech, data, image and 
video communication. In stationary systems the integration will increase, which leads to 
higher power density in the systems. The power dissipation problem has to be attacked on all 
levels from system level, algorithm, architecture down to the devices and the semiconductor 
manufacturing process. 

s The price/performance, required by different applications is not possible to achieve 
using standard digital signal processors (DSPs). Thus, a DSP in those cases has to be 
specifically designed for the actual application (ASDSP, Application Specific DSP). Then, a 
possibility may be to use the processor core of a standard .DSP together with memories for 
data and program, which are customized for the actual application. Input/output pons and 
peripheral functions can be designed as blocks on the same chip (Custom DSP). 

The limitations associated with a custom DSP approach is fixed wordlength, fixed 
functions and the speed of the core, limited by the architecture. The architecture of a 
complete standard DSP core is instead designed to be able to handle many different 
applications, resulting in not used silicon area for many applications, which may have special 
needs for functionality and wordlength. 

A class of DSP is designated the algorithm specific DSPs. The architecture and 
functionality in this class of DSPs are designed from the actual algorithm to be implemented, 
and consideration is taken of the system requirements, in regard of speed, silicon area, 
power dissipation and flexibility. DSPs of this class are designed by means of automatic 
compiler tools or pure full-custom design comprising conventional tools for silicon design. 

Present techniques of designing DSPs include those which use a behavioral 
representation that specifies the algorithm that is implemented in the processor as a program, 
and a structural description of the processor hardware modules and their logical connections. 
One of these systems is the LAGER system developed at Berkeley University, see Shung, 
Jain, Rimey, Wang, Srivastava, Richards, Lettang, Azim, Thon, Hilfinger, Rabaey, 
Brodersen: "An Integrated CAD System for Algorithm-Specific IC Design", IEEE Trans- 
actions on computer-aided design, vol. 10, No. 4, April 1991. 

A second system has been developed at the University of Lund, Sweden, see Mats 
Torkelsson: "Design of Application Specific Digital Signal Processors", Ph.D. thesis, Lund 
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University, Sweden, June 1990, Viktor Owall: "Synthesis of Controllers from a Range of 
Controller Architectures", Ph.D. thesis, Lund University, Sweden, December 1994 Chapter 
8, pp. 102 -. This system is similar to the LAGER system and uses a behavioral description 
of the processor architecture to extract a set of micro-operations. The extracted 
s micro-operation set constitutes the available operation set, used for the micro-programming 
of the processor. The microcode program is a description of the algorithm which is 
implemented, and additional declarations such as memories. 
DESCRIPTION OF THE 'INVENTION 

There is thus a need for a more semicustom oriented approach than that used in the 
io prior art. A design environment should be provided that will fill the gap between the 
automated compiler tools and the traditional full-custom DSP design. In the design procedure 
it is important to take advantage of all the benefits in the full-custom approach and to use 
them in a semicustom approach. The automated design used by the compilers, scheduling 
and resource allocation is difficult to make efficiently in a computer program. A key concept 
is should then be to put the designer in place, to analyze the algorithm and create the 
architecture manually and use the tools as an interactive aid in the design work. 

It is thus an object of the invention to provide a design-environment for design space 
exploration of processors. 

It is a further object to provide a partitioning of an algorithm to one or several 
20 computational units or processors and also to provide a possibility of doing performance 
estimates. 

Thus an iterative process can be provided comprising analysing the algorithm, defining 
the architecture, writing microcode and exploring the design space by simulation of the 
microcode. 

The invention is thus based on a methodology for generation of behavioral models for 
programmable processors. Such a processor is constituted of hardware modules like 
computational units and a control unit. The processor executes operations from a program 
stored M the control unit. The behavioral model is a description of the processor hardware 
and the program executed by the processor. 
» The main advantage of the suggested approach is the modelling of the processor and 
the microcode-program to be executed in the processor. The structural model of the 
processor architecture and the microcode-program (a subset of C) are included in the one and 
same textfile. 

The microcode-program is an implicit description of the architecture and by using the 
* present methodology there is no need for an explicit structural description of the processor 
architecture. 

The environment supports the implementation of an algorithm into a micro-pro- 
grammed application specific processor architecture. 

The variable names in the microcode match the instance names in the graphical 
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architecture description. Assignment of data between different variables (registers, memories) 
corresponds to the connectivity in the architecture. The microcode is written in a high-level 
assembly language which in a preferred embodiment is a subset of the C-language. 

This single format behavioral model of the processor and the associated microcode 

5 program can be linked together with "bit-true" simulation models from a library, where e.g. 
C++ overload operators can then be used. The linked code can then be used as the be- 
havioral simulation model on an event-driven simulator or executed as a plain high-level 
language program. Providing an asynchronous I/O interface on the processor it will be 
possible to run several processors concurrently in the simulator, communicating by means of 

io asynchronous hand-shaking: 

This way of writing microcode and simulating gives a very flexible way of mapping an 
algorithm onto a processor structure. It is easy to make changes in the micro-program code 
by means of a text-editor and then resimulating. The architecture description can be 
graphically visualised on a sheet of paper, as a support for the application engineer. There is 

is no need for a computer readable architecture because of the implicit description of the 
architecture in the microcode. 

Traditionally, at the company Ericsson the microcode has been generated into a 
Register Transfer Level (RTL) model and simulated together with the hardware model. The 
drawback of this approach is that the microcode is not verified before it is included in the 

20 hardware RTL-model. When verifying the hardware it is difficult to say whether the errors 
are in the hardware, or software, or both. The simulation time on this level is also several 
magnitudes longer than doing simulations on the behavioral level. 

Using the tools like LAGER, discussed above, the microcode has been generated from 
the algorithm by a compiler, doing resource allocation, scheduling and assignment. The input 

2s to the compiler has been the structural and behavioral descriptions. This way of auto- 
matically generating a processor model is not very efficient in terms of hardware area, 
power consumption and cycle count. Therefore a more manual way of analysing the 
algorithm and writing microcode programs is preferable (high level). 

In the prior art, input is needed from both the processor architecture such as a 

30 structural description, an extraction of operations and from the algorithm (the microcode pro- 
gram) for generating the complete processor model. The prior approach requires tools for 
extracting the operations needed for the processor model. 

The steps of the design process of integrated circuits can be partitioned into two major 
groups, the functional design and the implementation of the circuit into hardware. The design 

35 process can be characterized as both Top Down and Bottom Up. This means that the in- 
formation flow between the two levels is bidirectional. There is a Top Down approach from 
the algorithm level down to the logic level with a refinement process from level to level. 
There is also Bottom Up flow of information of what is available in the target library in 
terms of logical and arithmetic functions and the performance of these functions (speed, area, 
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power c„„ s „ mplion , The prese „, invemion . . n ^ func . ona| ^ 

mcdemng of an app„ca,,on specific processor (ASP) and associated microcode program i 
a behavioral model. F e mo 

BRIEF DESCRIPTION OF THE DRAWINGS 

5 , *J* inVenti °" bC deSCribCd in dCtail ^ w ^ of "^-limiting specific 

embodiments with reference to the accompanying drawings in which- 

signal^esL" " ^ "T™ ^ " in P"*** a dig.^ 

i. nxKMeT* ^ " 3 SChematiC ViCW ^ Pr ° CeSS bl ° CkS ° f 3 PrePr ° CeSSOr ^ 2 generati0n 

- Fig. 2b is a How diagram illustrating the steps executed by the preprocessor and 
generation module of Fig. 2a, pressor and 

- Fig. 3 is an example of a top level template of a processor, 

- Fig. 4 is an example of the module level of a processor, 

- Fig. 5 are examples of a control unit template and an address path template included 
m a processor illustrating registers and individual lines, 

- Fig. 6 is an example of a data path architecture as specified in a textfile 

DESCRIPTION OF A PREFERRED EMBODIMENT 

diff H f ', 1 1 3 SChematiC PiCtUrC deSCribing inf0rmati0n flow »d the relations between 
differ levels in the construction process of a digital signa, processor comprising design " 

and model hng the processor in a functional design environment and producing the necessary 

hardware for manufacture of the processor on an ,C-chip. The start step is an archhectur* 

» .dea indicated at 1 which is the menta. picture or idea of the architecture of a processor that 
" to be produced. This architecture does not have to exist as a graphical picture on paper or 
a display screen of a computer but may be thought of as some imagined apparatus 
performing some required functions. 

A microcode program is written in a high level language (HLL) and the result is a text- 

30 file indicated at 3. The code is written after analysing the functions which are to be executed 
by the processor, as indicated by the block -Algorithm" at 5, and also the desired or 
reqtnred arch.tecture of the processor, as indicated by the block "Architecture templates" 7 
The microcode program specifies implicitly the entire functional construction of the pro^ 
cessor, that ,s all registers, memories, units for performing operations, lines and buses 

as between the components, input/output devices, external lines and buses, etc The syntax and 
operations/operators of the microcode textfile must conform to rules, definitions etc as 
stored m a reference library 9. ' 

The definitions in the reference library 9 correspond more or less directly to behavioral 
models as stored in a library 11 which comprises "bit-true" models, also represented in a 
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high level language. The reference library 9 contains an operation set with associated 
behavioral models as appears from the following table. 



Models 
Operation set 



Processor template 
Control unit 
templates 
io Behavioral models 



Representation/Function 
Primitive operations (see below) 
Data path functions (see below) 
Address and control path functions 
Sheet of paper or computer stored 
Sheet of paper or computer stored 

Bit-true simulation models 



Format 
Text file 



Graphic 
Graphic 

HLL 



The behaviour models of the library 11 in turn correspond, also more or less directly, 
to various representations stored in a target or VHDL library 13. For each operation and 
function there will thus be a hardware implementation in the target library 13. Performance 
like speed, area and power dissipation and other parameters and associated information can 

is be accessed in the actual target library 13 and transferred into the behavioral model. 

The target library 13 can only comprise the VHDL (RTL) model, and the decision on 
the actual hardware implementation for producing a desired silicon chip can be taken in a 
later phase in a design project. If this is the case, no absolute values of area, speed or power 
consumption can be taken, only relative estimates can be made. 

20 A complete design environment consists of two special environments. The First one, the 
functional design environment is the one that is important herein. The second one is the 
implementation design environment which is a traditional design environment for silicon 
ASIC design. 

The interface between these two environments are the VHDL models, which are 
2s models of the processor hardware on a register transfer level, RTL. The models exist for 
registers, memories, arithmetic functions and control unit template and are standardized. 

Specifically, the target library or VHDL library 13 thus contains models in VHDL 
format at the Register Transfer Level (RTL). The models are modules of the DSP hardware 
and structural templates for different levels. 
30 The top level consists of the signal processor core including I/O definitions, as will be 
described with reference to Fig. 3. 

The second level, as will be described in detail with reference to Fig. 4, is built of 
instances placed in the core. A core consists of one Control Unit module, CU, and an 
internal or external Program memory, Pmem. The following modules can have one or 
35 several instances, Address Path AP, Data Path DP and Memories Mem. A core can thus 
have several DP modules and/or AP modules. Each of the modules have one or several 
templates in the structural VHDL description. 
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The third level is the lowest level which consists of symbols like Adders, Registers 
Multipliers, Shifters, ... and more complex functions added to the library for special 
purposes or needs. These modules and memories are modelled in VHDL as ENTITY and 
ARCHITECTURE descriptions, these words in capitals being reserved words in VHDL. 

The microcode program is written in the high level language as specified in the 
reference library 9. The code is written after the analysis phase of the algorithm and the 
architecture definition, and a model 18 of the processor and the program executed on the 
processor, will be the result in the textfile. To be able to execute the model on a computer 
the bit-true "Behavioral models" of block 11 have to be linked into the model, and also 
-o functions for the Input and Output of data together with synchronisation functions. 

The completed "Behavioral model" 18 can then be used for simulation 19 and as a 
source for the generation 21 of hardware. 

To be able to use the model as specified by the textfile 3 it is processed by a pre- 
processor program 15 run on a computer. In the execution of the preprocessor program 15 it 
.swill need the bit true behavioral models in the library 11, as referenced by the operations in 
the textfile 3, performance information from an implementation environment 22, if available, 
and also, for simulation purposes, functions for controlling the input and output of suitable 
signals simulating exterior stimuli, the main code block in a external control block 17, to 
wrap the behavioral models and various functions and include files into a complete program 
20 in 18, that can be linked into an appropriate simulator 19. 

When the various items are needed, they are linked to the main program 18 as 
produced by the block 15, which is the complete behavioral model of a processor core 21. if 
they are in the shape of computer stored program routines or picture details, and" otherwise 
they may be suitably activated. 

The complete textfile as stored in the block 3 is thus used by the preprocessor program 
15 for producing a program for simulation of the operation of the designed processor to be 
run in the block 19, and also as a source for the generation of hardware and software, as 
indicated by the block 21, see the picture of Fig. 2a illustrating the interrelation of the pre- 
processor 15 and the generation module 21 and showing the program parts associated with 
M these modules. A corresponding flow diagram is shown in Fig. 2b. 

The preprocessor 15 performs the following steps: After a start block 201 the textfile 
stored in 3 is read in a block 203. Then the textfile is checked for syntax errors in a block 
205, see also block 15a of Fig. 2a. If errors are encountered as tested in a block 207, they 
are indicated by displaying or generating appropriate errors messages, see block 209 and 
35 15b. Then the preprocessor will halt the operation in a block 211, awaiting a new textfile to 
be processed. If the textfile has a correct form it can be processed further, for example in an 
optional step 15c or a block 213 where a graphical picture of the architecture as represented 
by the textfile is produced, e.g. manually on a piece of paper. Then this picture can be 
approved by a comparison to the intended architecture, as for example represented by 
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graphical pictures showed on the monitor of a computer and a netlist, as indicated by the 
block 215. The comparison can be made automatically by the computer or visually by the 
processor designer, the designer then indicating by some manual input operation whether the 
architecture represented by the textfile matches the desired architecture, 
e If the textfile is formally correct, the textfile can be processed further by the pre- 
processor for producing the behavioral model and thus a base for a simulation of the 
architecture, as is indicated by the block 15d of Fig. 2a and by the procedural steps 221 - 
225 of Fig. 2b. Thus in the first step 221 the textfile is converted to the real C + + -code, 
meaning substantially that parallel operations, as will be described more hereinafter, are con- 

io verted to be performed serially or consecutively, one after the other. For simulation 
purposes, instructions for cycle count and measurement parameters are added to the code. 
Then, also operations representing I/O and external, asynchronous control are added, as 
obtained from the block 17 of Fig. 1. For forming a complete executable program, a main 
program, ,, main() , \ is added in the conventional C-programming style. Then the C++-code 

is is compiled in a step 223 and the compiled program is linked to other program modules in a 
step 225. Then an executable program has been obtained, that in a sense can said to be the 
"behaviour model" of block 18 of Fig. 1 and also can be input to the block 19 for simulating 
the architecture defined by the textfile in 3. 

The VHDL generation part as indicated by the block 21 of Fig. 1, generates a 

20 structural, implementation independent VHDL model, including data path, control unit and 
program memory, from the microcode and architecture description, as originally specified in 
the textfile of block 3. The procedures performed in the block 21 are illustrated by the sub- 
blocks 21a - 21 d of Fig. 2a and the steps 231 - 239 of Fig. 2b, all the operations being 
based on the textfile in 3. 

as The generation of the actual microprograms to be executed in the signal processor to 
be produced and especially in the Control Unit module thereof is made in the block 21a of 
the generation block 21. This generation can possibly use some intermediate convened file 
comprising microinstructions as produced by the preprocessor 15 in the block 15d and sub- 
step 221, or else use the original textfile stored in 3. Thus in the step 221 generally 

30 information is extracted from the textfile in 3 so that operations can be mapped on the actual 
instructions of the language to be used and so that a symbolic microcode can be generated. 
In the successive substep 233 the microcode obtained is converted a suitable table in the 
VHDL-format containing e.g. a binary table comprising a sequence of the ASCII-characters 
T and "0". This table thus comprises the program that is to go into the VHDL model for 

as the program memory and is the microprogram to be executed by the CU. 

Information on hardware requirements such as wordlength, registers, operations, 
memory, the connection lines therebetween, etc. is extracted from the program code in 3 and 
used for generation of the processor core in a block 21b, as illustrated also in the substep 
235 of Fig. 2b. This extracted, combined information can be output or displayed as 
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generator information to the processor designer who can use it for manual editing or manual 
generation, meaning that then the process is stopped after the step 235, awaiting some signal 
for continuing the sequential procedural steps. More particularly, the processor core 
include I/O definitions is generated from the textfi.e and therefrom the more detailed 
5 architecture including associated operations, registers, memories are used for forming the 
control un.t and the appropriate APs, DPs, Mems and how these modules are connected 

Then „ , a block 21c and a step 237 the VHDL-model of the processor core hardware is 
generated and it ,s thus a structural, implementation independent VHDL model, including 
data path or paths, address path or paths, control unit and program memory, from th! 
10 architecture description of block 21b and the substep 235. 

Finally, in the block 21d and substep 239 the hardware description of the processor 
core an the VHDL-microcode tab,e are Hnked or combined in a b.ock 21d and the subl 
239 to form a complete VHDL-description. Then a structural VHDL architecture netlist is 
also used, which is a structural specification of hardware according to the VHDL-standard 
-s This structural specification can be extracted from a graphical representadon or from a 
VHDL-template and comprises in particular the connections of the control unit module CU 
which cannot be extracted from the textfile. 1, is thus a netlist of the connectivity 0 f the 
modules of the processor, as illustrated in Figs. 3 - 6. In the linkJng process naturaHy aJso 
templates from the VHDL-lib 13 are used. Thus an implementation independent VHDL- 

20 v»n ,S , 0btained ' th2t " USCd 2CtUa,ly Pr ° dUdng thC Sign3] >™ TT« block 21d 
VHDL lowest level models, that is RTL mode.s for Registers, Arithmetic functions 
memories, wh.ch are referenced from the VHDL library in 13. are linked together fir* 
internally to the modules DP, AP and Mem and then at the top level, that is inc.udin* the 
I/O- unctions. The VHDL net.ist links the different modu.es. compare items DP 1 N 

« APJ..N, Mem_l..N, Pmem and CU, as illustrated in Fig. 4. ~ " ' 

The program memory Pmem may be implemented as ROM or RAM, internal or 
external to the core. 

The objectives in simulating the microcode are (the simulation is made by generation of 
executable code in block 15d, this code being equivalent to the actual microcode to be used 
30 that is generated in block 21): 

- verification of the microcode. 

- obtaining the number of cycles required by the application. 

- estimating power consumption. 

- estimating the physical area required on the chip to be produced. 

M , , Th ' arChiteCtUre ° f 3 P rocessor t0 be ^signed is represented in three hierarchical 
levels. The top level consists of the processor core 23 comprising I/O definitions, see the 
graphical representation of Fig. 3. 

There are thus inputs for interrupt signals, INT_A ... INT_Z, an input for a reset 
signal RESET for resetting the processor core, inputs/outputs for data transfer, DATA, for 
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connection to some suitable data bus, and various other inputs/outputs carrying signals that 
are needed by the processor for the exterior communication. 

While the internal design of the processor core assumes a synchronous design style, 
asynchronous handshaking is supported for the I/O communication external to the core. An 
6 algorithm or a system that the processor is to perform or implement, can be divided on 
several processor cores on the same chip or different chips. The asynchronous handshaking 
and interrupt/acknowledge signalling make it easy to partition the system on several 
processor cores and to make design space exploration by behaviour simulation. 

Functions like PCM-coding, FFT (Fast Fourier Transform) or DCT (Discrete Cosine 

io Transform) can either be external or internal to a processor core 23. If a function is internal 
(i.e. must fit within a datapath module) all the models have to exist for the function in a pro- 
cessor library (behavioural models) and an implementation of the function has to exist in the 
VHDL library 13. If a function is external to the processor core 23, there are no other 
requirements on the function other than I/O communication and asynchronous handshaking. 

is For simulation purposes, of course the external control block should then be capable of pro- 
viding the correct signals. 

The second level, see Fig. 4, is built of instances or modules placed in the core 23. A 
core 23 thus comprises one control unit module 31, CU, and an internal or external program 
memory 33, Pmem. The following modules can have one or several instances. Address path 

20 35, API, AP2, APn, data path 37, DPI, DP2, DPn and memories 39, Meml, 

Mem2, Mem3. A core 23 can have several DP modules 37 and/or AP modules 35. This 
is a trade-off that the application engineer/designer has to do when he defines the 
architecture for a core 23, that is either to partition the algorithm on several DPs 37 in the 
same core 23 or to use several cores 23 to partition the algorithm onto. 

25 Each of the above modules has one or several templates. A template is a module which 
is predefined to a certain extent. Fig. 5 shows an example of the templates for the CU 31 
and the APs 35, where the pans drawn in solid lines form the basic structure and the dotted 
lines is an example of how to use the templates to build an actual CU with an associated AP 
for a certain application. 

30 The purpose of such a template is to put limitations on the modules and to keep a 
structured design methodology. This will simplify the pipelining and timing analysis for 
implementing the structure on a silicon chip. Thus the main templates should be specified so 
that they have a balance between limitation and flexibility in the architecture. 

Templates like those illustrated in Fig. 5 are used for writing the textfile stored in the 

35 block 3 and then the definitions as stored in the reference library 9 are used. This will form 
a third level which is the lowest level and comprises symbols like adders, registers, 
multiplier, shifters, ... or more complex functions added to the library for special purpose 
needs, where the library comprises the reference library, the behavioral models and the 
VHDL library. The primitive operations/functions and other functions must correspond to 
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the operation code set in the reference library 9. 

The reference library 9 contains the declarations and the syntax information of 
arithmetic operations, or as defined here, the primitive operations/functions that can be 
executed m the datapath and addresspath. The operations are: 

s - Primitive operations 

" + \ " + + ", -<<- f ->>- 

- Primitive functions (function modules in HW). 
DP: abs(), norm(), div(), dctO, fft() 
AP: buff(), end buff(), buff_p() 
io - Dataflow (assignment operator) 



where abs() means the function of taking the absolute value, 
norm() means normalization, 
div() means division, 
15 dct() the discrete cosine transform, 

fft() the fast fourier transform. 
These are only examples of functions which can be included in the datapath. 
An expression to be used in the textfile as stored in the block 3 can have the form: 

AccL.n = ±(a + b); 

where the accumulator to the left is a register Accl. Acc2 Accn, not shown in 

one of the datapath modules DP and can be assigned, as indicated by the symbol » = » the 
sum or difference of the registers "a" and «b". The registers are declared as variables in the 
textfile stored in 3. 

The primitive operators "<<« and »>>" are arithmetic left and right shifts 
26 respectively. A constant "k" or variable determines the number of shift steps: 

AccL.n = ± c < <k (AccL.O ± b); 
AccL.n = ± c > >k (AccL.O ± b); 

The assignment of data describes the dataflow inside the processor. Data can be 
assigned (transferred) between registers, memories and Input/Output registers. 
30 The functions can be called as an ordinary function in a HLL: 

AccL.n = abs(a); 
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Here the absolute value of the content in a register M a" is loaded in an accumulator 
Accl, Accn. 

The functions for a datapath 37 and an addresspath 35 can be different and are 

normally so. The AP 35 has for example a circular buffer function for calculation of the 
s address pointer for a circular buffer stored in memory: 

const int *buffj)(i mem_A) = adr ;Init pointer i 



buffer(i mem_A start stop)- ;Circular buffer 

10 { 

(statements) 

mem_A + + increment pointer 

(statements) 

} 

is end_buff(i mem_A) ;Store pointer on stack 

The control flow in the program is controlled by the program counter, jumps, 
conditional jumps and loops. The increment of the program counter is not explicitly 
expressed in the program code. The counter is incremented by default for each statement in 
the program code, unless it is overridden by any of the control functions. However, each 
20 statement is ended in the program code by a semicolon ";'\ which can be seen as indicating 
an increment of the program counter. 

Parallel operations can be executed in a microinstruction or statement (concatenation of 
several microoperations). The parallel micro-operations are separated by a comma V in the 
program code. 

25 The operations controlling the dataflow in the processor are: 

- Conditional operations 

op> ::= < , > , <— , > - , , iu 

- Conditional statements 
if(), else 

3o - Loops 

for(), for(;;) 

- Unconditional statements 
goto label 

- Control functions (implemented in the control unit) 
35 hold(), wait(), pwd() t en_int(), disabjnt() 

All the conditional and control functions are implemented in the control unit hardware 
and have not to be implemented in the reference or behavioral libraries 9 or 11, more than 
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possibly the syntax thereof may be declared in the reference library 9. This approach of 
modelling processors takes advantage of the actual high level language where these 
functions are all included implicitly, and does not need any separate modelling of the 
functions on the behavioral level. 
■> A conditional statement can be of the form: 

if(a <op> b) ;Conditional Jump 

{ 



} 

else 



} 



or 



if(a <op> b) jingle operation 

< operation > ; 

A loop control function is written as a "For" loop: 

20 for (k = m; k > = 0; k~) ;Load loop counter 

{ 



;Jump to decrement loop counter 



2S 



or 



for (k = m; k > = 0; k~) ;Single repeat 
< operation > ; 



or 



for CO infinite loop 
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The loop counter "k M is located in one of the addresspaths 35, and is loaded with an 
initial value "m" and is decremented to zero. This is a restriction on the function, the loop 
counter can only be decremented to ze/o. It is more efficient from a hardware point of view 
to detect a negative state. This is a minor drawback for the programmer and can be handled 
6 easily. 

The unconditional jump instruction is loading the program counter with the program 
address of the label: 

goto label ;Jump to address 

The following control functions affect the program flow, holding the program counter 
io or loading it with a new value (label): 



hold(n) 

wait( < label > ) 
pwd( < label > ) 
en_int(A...Z) 
15 disab_int(A...Z) 



;Stop execution n cycles t 
;Wait for wakeup 
;Power down 
;Enable interrupt 
;Disable interrupt 



The external interrupts are controlled by two functions Enable and Disable. The para- 
meters are the signals on the interrupt input ports INT_A, INT_Z. There can be any 
number of interrupt ports from zero to as many as required by the application. The wait and 
pwd functions are terminated with an external interrupt or reset signal. 

20 The microcode program which is stored in the block 3 is written using a conventional 
text editor and utilises a semantic model of the processor and the microcode program, exe- 
cuted on the target processor. 

The transformation process from the algorithm description down to the textfile in the 
block 3 is done using manual evaluation. The process is iterative and can be defined in the 

2s following way: 

- Analysis of the algorithm, block 5, and system performance. 

- Definition of the target architecture, block 1, based on processor templates, block 7. 

- Writing program-code, stored in 3, based on the declared primitive operations and 
functions in the reference library 9. 

30 The process of writing/editing the code in the textfile and updating the architecture is 
repeated until an acceptable result is achieved. The code is first syntax checked in the pre- 
processor and then used as the source code by the preprocessor program of block 15 in the 
following steps: 

- A syntax check of the textfile by calling the definitions of the reference library 9. 

35 - An optional check of correspondence of the processor as specified by the textfile with 



WO 97/13209 

PCT/SE96/01236 

•14 

the contemplated architecture. In the preprocessor 15, for example, a graphical illustration of 
the logic construction of the processor or components thereof can be generated on a display 
screen. However, preferably the architecture can be generated manually by the designer as a 
graphs illustration of the logic construction of the processor or components thereof on a 
s display screen. 

- A preprocessing of the textfile source code, 

- - Serializing parallel operations, that is substituting for \\ 

- - Adding performance measurement parameters, as taken from the block 22. 

- Setting up, if necessary for simulation or performance tests, I/O, external control in 
io the block 17 (core wrapper),. 

- Compiling source code module to relocatable object code. 

- Linking of behavioral, "bit-true" functions. 

The linked compiled object code can be used for simulation on an appropriate 
simulator. rv v 

The program code of the textfile stored in the block 3 comprises two parts the 
declaratton part and a body with program statements. In the declaration part the allocation of 
memory is specified. Memory in this case comprise RAM. ROM and registers The registers 
can be of three different types: input, output and register. The input and output registers 
have to be specified separately because of the I/O handshaking. 
20 Below is an example on a declaration: 

REG Mereg[16], Mareg[12], Sacc[22], Acc[22]; 
INPUT Ireg[16]; 
OUTPUT Oreg[16] ; 
RAM Mem 1 [256, 16]; 
ROM Mem2[1024. 16]; 

where the first line specifies typical storage elements and the last two lines specifies a typical 
worlang storage and storage fixed data respectively. The middle lines specify input/output 
registers. The program memory Pmem must be specified otherwise. 

The architecture for the above declaration can be seen in Fig. 6 illustrating the 
so architecture of a datapath 37. 

Beside the storage, that is declared above, there are two primitive operations used in 
this arclutecture, multiplication - and addition ■ + -. The connectivity or the static data 
flow ,s described by the arrows and the bullets in Fig. 6. An arrow indicates an input , 0 the 
module and a bullet indicates that the output of a module is connected to the bus. 

The dynamic data flow, see Figs. 7a and 7b, on the other hand is defined by the 
statements m the program code and is a flow from the right-hand side to the left-hand side in 
the program-code of the textfile at a specific time, when the statements are executed This is 
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the same as the "bullet to arrow" notation when a transfer takes place on the bus. The 
statements on the other hand are executed from the micro-instructions in the program 
memory as controlled by CU and are normally defined as the control flow. 

In the architecture all the storage elements has an instance name and this name must be 
5 the same as the variable name in the declaration list. To summarize the declarations: 

- Primitive operations are declared in the reference library 9. 

- Storage elements are declared in the program textfile. 

An instance of a primitive is identified in the architecture by its connectivity. The 
add-operation " + " in Fig. 6 is identified by the connection to Ireg, Mereg on one of the 
io operand inputs, and the connection to Acc_bullet and Sacc on the other operand input. The 
output is identified by the connection to the input of Sacc. This is the same as the semantic 
description: 

Acc = Acc + Ireg; 

The output here is the assignment to Acc on the left-hand side of the expression. 

is However, there can be an ambiguity in this way of identifying the operators, if we 
have two, exactly the same operators connected to the same set of registers for both the 
inputs and the output. Then it is not possible to decide which one to use. This is not a 
normal way of doing design and should be avoided. There can for example be two similar 
operations which are connected in parallel. They can be distinguished by using different 

20 symbols for the operation. Normally, when using parallel operations of the same- type, they 
are at least connected to different output registers. 

The program code of the textfile describes the data/control flow in the processor to be 
produced and is in the same way a semantic description of the connectivity of the processor 
as discussed above. Another way of expressing the relationship between program code and 

as processor architecture is the fact that a computer program is an implicit description of the 
processor architecture for low level programming like Assembler programming. This fact is 
exploited here, to put restrictions on the architecture, by using templates and to have strict 
rules for the program coding, by using a reference library 9. Below is an example of a pro- 
gram written for the target DP-architecture of Fig. 6. 

30 From the declaration of the storage element we can see that the wordlengths of some of 
the elements are different from that of others. The evaluation and generation program and 
the hardware assumes most significant assignment of a word if the left-hand side has a 
shorter wordlength. If not most significant assignment, the bitfield for the variable has to be 
specified, e.g.: Mem = Acc[6, 21]; 

35 The address calculation "adr" and the loop control variable (register) M k H is based on 
the controller template shown in Fig. 4. 

The operations in bold are operations executed by the data path 37, the other 
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operations or statements specifying functional details. 

As can be seen from the program example below, the architecture of the data path 37 
can be extracted from the code. This is also the case for the addresspath 35. 

for(;;) // Infinite loop 

{ 

adr = 0; //Address register is not included in DP 

for(k = 127;k > = O; k-) 

{ 

MemA = Acc; //Write to memory 

'° Mareg = MemA, //Read from memory 

adr = adr + 127, //adr register incremented in AP 

Oreg = Acc; //Accumulator to output 

Mereg = MemA, //Read from memory 

adr = adr+ + ; 
15 Acc = Acc + Mereg; 

MemA = Acc; 

Acc = Sacc + Ireg, //Add Sacc and input 
Sacc = Mareg * Mereg, //Multiply 
adr = adr- 1 27; 

» To summarize the important features of the method of generating a behavioral model 
it can be expressed as follows: 

- Definition of the target architecture, based on processor templates. 

- Writing program-code in a single textfile description based on declared primitive 
operations and functions, and a set of syntax and programming rules stored in a reference 

25 library. 

- - Primitive operations and functions are declared in the reference library. 

- - Storage element are declared in the program textfile. 

- One to one mapping, transformation between abstraction/model levels without any 
synthesis taking place. There is a one-to-one relation between the levels- 

- - - Reference library/textfile Operations-Functions/Variables 

- - Behavioral bit-true model Implemented functions/Variables 

- - Target architecture VHrM ... 

VHDL library/Instances 

- Identify primitives, functions and instances in the target architecture from the 
program code, based on the fact of the implicit description of the target architecture. 
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CLAIMS 

1. A method of producing a digital signal processor having a predetermined structure 
and predetermined functions comprising the steps of 

- determining an architecture of at least one processor part, 

5- making a layout description of the corresponding circuits based on the determined 
architecture, 

- making the layout of the corresponding circuits based on the layout description, 

- processing a semiconductor chip according to the layout, 

characterized in 

io - that in making a layout description of a processor part the structure of the processor part is 
coded as definitions of variables including the particular characteristics of each variable, the 
variables indicating registers in the processor, and further as operations and/or functions 
which are to be performed by the processor part using the variables as arguments, and 

- that these definitions of variables and operations and/or functions are interpreted to form 
is the respective registers and units performing the functions and lines connecting the registers 

and the units. 

2. A method according to claim 1, characterized in 

- that in making the layout description the definitions and the operations and/or functions are 
written as textfile, and 

20- that the textfile is interpreted sequentially or consecutively, starting from the beginning of 
the file. 

3. A method according to claim 1, characterized in 

- that the definitions of variables and operations and/or functions are made for essentially a 
whole processor, and 

25 - that when making the interpreting, a basic, predetermined structure of the processor is 
used, the processor thus having at least a control unit associated with a program memory, at 
least one address path module, at least one data path module and at least one memory 
module. 

4. A method according to claim 3, characterized in that the occurrence of an 
30 arithmetic operation in the definitions of variables and operations and/or functions signals an 

arithmetic unit in a data path module, the variables being operated on being interpreted as 
registers in the data path module. 

5. A method according to claim 3, characterized in that the operations and/or 
functions comprise a special operation for performing in parallel or simultaneously 

35 operations and/or functions linked by this special operation, the occurrence of this special 
operation in the definitions of variables and operations and/or functions signalling that the 
processor is to comprise at least the same number of arithmetic units, each one in a data path 
module, the variables being operated on being interpreted as registers in the respective data 
path module. 



WO 97/13209 

PCT/SE96/01236 

1/6 



Reference Library 

Operations Set - "HLL" 



Behavioral Model BniMin g 

5 



t 



Behavioral models 

"bit-true" HLL 



t 



13 



VHDL Library 



Library 
• VHDL mode) 



Relations 
Information 



Optional 

Fig. 1 



^ Algorithm 




Generation of 
HWandSW 



A rchitectureTfibtz. 



Architecture template j 



I/O, External Control 




1 


r 


^ Simulation 



11 



Performance 
Feedback from 
Implementation 
En vironment 



SUBSTITUTE SHEET 



WO 97/13209 PCT/SE96/01236 

2/6 



Textfile 



Fig. 2a 



1 5a 



Preprocessor 

' Syntax check I 
! Parser tree generation ; 

|" Generation of j 
| error messages ; 

1 5b 



i Generation'of"""" is c 
; graphical represenracion , 

|" "Generation of 
►! executable code 
! for simulation 



Structural VHDL 
Architecture Netlist 



21 



2la^ 



21b - 
21c 



21d 



5d 



Code generation,. 
ASCII-table. VHDL 



Core generation 



VHDL-Model of core 



Link VHDL-Model 
of CU, DP AP Mem etc. 



VHDL-Lib 



Implementation 
Independent 
VHDL 



Generator 
Information 



SUBSTITUTE SHEET 



WO 97/13209 

PCT/SE96/01236 

3/6 



( Start ^20? Fig. 2b 



Read Textfile 



Syntax check 
Generate parser tree 



203 



205 



209 



Display error 
messages 

( Stop ) 
21 1 



Textfile 
formally 
Qprrecu^ 
Yes 



,207 



235 

/ 



-CI 

Stop 



; Generate graphical'] 
representation of ! 
architecture 2 ' 3 



^'"Architecture" — ' 2 1 5 
.equal to desired? > 

jYes - 



No 



217 



Extract information 
from textfile: 
Operations, 

Registers, memory 
Conn ectivity 



Generate VHDL- 
description of 
hardware 



( 

237 



2_ 



231 



Extract information 
from textfile: 
Map operations 
on Instructions 
Generate symbolic 
microcode 



1 



221 

\ 



239 

1 



Convert microcode 
to ASC/l-table of 
" 1 " and "0" in 
VHDL-format 



Convert textfile to C++-code: 
Serialize parallel operations 

Add cycle count and 
performance measurement 
parameters 
Add I/O, 
asynchronous external control, 
main program 



f 
223 



Compile complete 
C+-f-program 



Combine description of 
hardware and microcode 
to complete VHDL^escription 



i 

233 



( Stop ) 



T 

225 



Link Fix Class Library 
(Behaviour modelsl 



6NS0OC1D: <WO 071 320QA1 JL> 



SUBSTITUTE SHEET 



WO 97/13209 



4/6 



PCT/SE96/01236 



Fig. 3 



INT.2 
INT.A 

Sl_X 
EI,X 



.READY 

DATA : 

ED 



21 



Processor 
Core 



so_x 

OREQJC 
JEO.X 

RESET 



MCL 



Fig. 4 



Memn 



Mem2 



Meml 



39 



DPn 



DP2 



DPI 



Local Bus 



APn 



3:> 







• 
• 






A 


P2 














A 


PI 















Control 

cu 

Status 



i 



External 
Control 



Status 



31 



Pmem 



I/O 



I/O 



I/O 



SUBSTITUTE SHEET 



WO 97/13209 

PCT/SE96/01236 

5/6 




nuorv-w-irv ^A/n oTirraii i „ 



SUBSTITUTE SHEET 



WO 97/13209 



PCT/SE96/01236 



6/6 




u 



Sacc 



EO 



SO 



3^ \/ 



Oreg 



+ 

— 
Acc 



Mai 


eg 




* 



Mereg 











Ireg 





Fig. 6 



Acc 


= Sacc + Ireg; 




x->v-^ Sacc 


Acc 


- — OCT 




- — «Ireg 



Fig. 7a 
Fig. 7b 



SUBSTITUTE SHEET 



1 

INTERNATIONAL SEARCH REPORT 



International application No. 

PCT/SE 96/01236 



A. CLASSIFICATION OF SUBJECT MATTER 



IPC6: G06F 17/50 

Accord.ng to I nternational Puent Clarification (IPC) or Co bolh national chcification an d IPC 
B. FIELDS SEARCHED ~ ~~ 



Minimum documentor, (evened (clatxificauon ryrtem followed by dattirtcation tymboU) 

IPC6: G06F 



Documentor, *arched other than minimum documentation to the extent that cuch documenu are included in the field, cearched 

SE,DK,FI,N0 classes as above 



Electronic dau ba,e coruulted during the international «arch (name of data ba t e and. where practicable, «arch term, ured) 

mSSh^SsJKSSig^Si DATABASE SCSEARCH, INFORMATION SCIENCE 



C DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Gution of document, with indication, where appropriate, of the relevant passages 



IEEE JOURNAL OF SOLID-STATE CIRCUITS, Volume 24 

No 2, April 1989, BAHER S. HAROUN ET AL, "SPAID: 
An Architectural Synthesis Tool for DSP Custom 
Applications", page 426 - page 427, figure 1 



IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, Volume 
9, No 9, Sept 1990, GERT GOOSSENS ET AL, "An 
Efficient Microcode Compiler for Application 
Specific DSP Processors", page 925 - paqe 926 
figure 1 



Department of Applied Electronics, Volume, 1996 

(Sweden), Viktor Owall, "Synthesis of Controllers 

from a Range of Controller Architectures" 

page 102 - page 105, "8.1"cited in the application 



[x] F»rlher documents are listed in the continuation of Box C. [~ j See patent family annex 



Relevant to claim No 



1-5 



1-5 



1-5 



* Special categories of cited document* 

*A* document coining me general state of the art which is not coandered 

to be of particular rdcvancc 
*E* crlier cfecument but published oo or After the iiiterriaticmal filing date 

cited to establish the publication date of another duaon or other 
special reason (as specified) 

'O m document referring to an oral dbdosare, use; exhibitiaa or other 
means 

•P* document publiihed prior to the international filing dau but later than 
the priority date claimed 



Date of the actual completion of the international search 
18 FgbrnapY ]QQJ 



T* later documcni published after the international filing date or priority 
date and not in conflict with the application but cited to understand 
the principle or theory underlying the invention 

•X* docum ent of particular rdevaocc the daimed invention cannot be 
considered novel or cannot be considered to involve an inventive 
step when the document is taken alone 

*Y* documcni of particular rdevaocc: the daimed invention cannot be 
considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious to a person skilled in the art 

'&* document member of the same patent family 



Name and mailing address of the ISA/ 
Swedish Patent Office 
Box 5055, S-102 42 STOCKHOLM 
Facsimile No. +46 8 666 02 86 



Form PCT/ISA/210 (second sheet) (July 1992) 



Dale of mailing of the international search report 

25 -02- 1997 



Authorized officer 



Jan Silfverl ing 

Telephone No. + 46 8 782 25 00 



INTERNATIONAL SEARCH REPORT 



International application No. 

PCT/SE 96/01236 



C (Continuation). DOCUMENTS CONSIDERED TO BE RELEVANT 



Category* 



Gtation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN, Volume 

10, No 4, April 1991, C. BERNARD SHUNG ET AL, "An 
Integrated CAD System for Algorithm-Specific IC 
Design", page 447 - page 449, figure 1, 
cited in the application 



DOCTORAL DISSERTATION, Volume, May 1990, (Sweden), 
Mats Torkelson, "Design of Application Specific 
Digital Signal Processors", part 2, page 3-23; 
part 6, page 585-586"2"; part 9, page 
5 0 " I mp 1 ement at ion" 



1-5 



1-5 



Form PCT/ISA/210 (communion of second cheel) (July 1992) 



THIS PAGE BLANK (USPTO) 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 



□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 

□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 
. □ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: __ 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 




BLACK BORDERS 



THIS PAGE BLANK (uspto 



